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Abstract: We construct a parametrization of the lepton energy spectrum in inclusive 
semileptonic decays of B mesons, based on the available experimental information: mo- 
ments of the spectrum with cuts, their errors and their correlations, together with kine- 
matical constraints. The result is obtained in the form of a Monte Carlo sample of neural 
networks trained on replicas of the experimental data, which represents the probability 
density in the space of lepton energy spectra. This parametrization is then used to extract 
the b quark mass in a way that theoretical uncertainties are minimized, for which the 
value ml^ = 4.84 ± 0.14<=^p ± 0.05'^ GeV is obtained. 
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1. Introduction and motivation 

In the last decade the field of B meson physics has been the object of a wealth of studies (see 
Ref. [1] and references therein), motivated by the high precision measurements from the B 
factories, Belle and Babar. In particular, the inclusive semileptonic decays B Xliy, where 
X stands for a hadronic system, have received a lot of attention, both in the theoretical [2,3] 
and in the experimental sides (see [4] for an up-to-date summary of the present situation), 
due to its paramount importance for the determination the CKM matrix elements, and also 
since they provide important information on the underlying strong interaction dynamics. 

It is well known that differential distributions in inclusive semileptonic decays of heavy 
mesons can be computed by means of the Operator Product Expansion [5,6]. The resulting 
distributions are singular and can only be compared with the experimentally measured 
distributions after smearing over a sufficiently large interval. 
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In principle one can measure not only the branching ratios of these modes but also 
the full differential spectra on certain kinematical variables, like the lepton energy or the 
hadronic invariant mass. However, practical considerations force that the observables that 
are measured are convolutions of these spectra with suitable weight functions and given 
kinematical cuts. The most common case is when this observables are moments of the 
spectra. On the other side, as has been mentioned before, there is no pointwise theoretical 
prediction for these spectra, since the output of the theoretical computation is not a normal 
function but rather a distribution, which is a general feature of partonic cross sections, and, 
therefore, only integrals over a sufficiently large energy range can be reliably computed 
in perturbation theory. Therefore, one has to smear the theoretical prediction for the 
spectrum to compare with the experimental measurements. 

From all the above reasons, it is clear that it would be interesting to obtain from ex- 
perimental data the full spectrum with uncertainties, to allow a more general comparison 
with theoretical predictions. Such parametrization of the spectra would, for example, allow 
a comparison of general convolutions of the spectra with arbitrary kinematical cuts with 
theoretical computations, even if these convolutions have not been measured experimen- 
tally. Another application could be to study possible violations of quark-hadron duality in 
these lepton spectra, or to estimate the size of higher order corrections, both perturbative 
and nonpcrturbativc. 

Our purpose in this work is to allow for a more general comparison of the theoretical 
predictions with the experimental data. With this aim a parametrization of the lepton 
energy spectrum from available experimental information on its moments is constructed, 
supplemented by constraints from the kinematics of the process. Traditional strategies, 
like fits with functional forms, suffer from the well known problem of parametrization 
bias and, moreover, do not allow a determination of the uncertainties associated to the 
parametrization, so new suitable strategies must be developed to address this problem in 
a statistically meaningful way. 

Recently, a novel approach to the problem of the parametrization of experimental data 
in an unbiased way with a faithful estimation of the uncertainties was proposed, based on 
the combination of Monte Carlo techniques and neural networks as basic interpolating 
tools, which determines the probability density of the parametrization. This technique has 
been successfully applied to the parametrization of deep-inelastic structure functions [7, 8] , 
spectral functions from hadronic tau decays [9] and parton distribution functions^ [11,13]. 

This success motivated us to implement this technique in the context of B physics. 
Therefore, in this work we construct an unbiased parametrization of the lepton energy 
spectrum is semileptonic B meson decays with a faithful estimation of the uncertainties. 
Since in Refs. [7-9, 11] the technique that will be used in this work is discussed in detail, 
here only those aspects which are special to the present application will be emphasized. 
Fig. 1 shows a diagram that summarizes our parametrization strategy. 

As a byproduct of our analysis an extraction of the heavy quark nonperturbative 
parameters A15 and Ai will be performed using a technique that ensures that the associated 

^Many ideas that appear in the present work will be developed in a more detailed way in a forthcoming 

publication [10]. 
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Figure 1: Diagram that represents schematically our strategy to parametrize the lepton energy 
spectrum with a Monte Carlo sample of neural networks. 



theoretical uncertainties are minimized [14] . It will be shown that in this case the dominant 
uncertainties in the determination of these parameters turn out to be the experimental 
uncertainties, that is, those associated to the uncertainties in the parametrization of the 
spectrum. 

Summarizing, there are three main motivations to construct a neural network parametriza- 
tion of the lepton energy spectrum in B meson decays. The first one is to generalize the 
approach of Refs. [7-10] to the problem of the construction of a unbiased determination of 
physical quantities with faithful estimation of their uncertainties from experimental data 
in the case for which the only available information on this quantity comes through trun- 
cated moments, as is the case for the lepton energy spectrum. Second, to show how this 
parametrization allows a more general comparison of theoretical predictions with data, 
since from our parametrization one can extract for example moments that have not been 
measured, like non-integer moments, higher order moments or moments with large cut in 
the lepton energy Eq, and use them for several purposes. In this work we examine two 
of such applications: the comparison of the theoretical accuracy with which higher order 
moments or moments with large Eq are computed with respect to that of experimental 
measurements (Section 6.2), or novel methods to determine non perturbative parameters 
like mh (nib) from non-integer moments (Section 7). Finally, the set of techniques described 
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Figure 2: Semileptonic decay of a B meson into a cliarmed final state 

in the present work allow for a straightforward generalization to other relevant problems in 
B meson physics, like the determination of the B meson shape function from experimental 
data S{io) [12]. 

The outline of this paper is as follows: in Section 2 we summarize the theoretical 
aspects of semileptonic B meson decays, and in Section 3 the experimental data that 
will be used. In Section 4 we describe the generation of Monte Carlo replicas of the 
experimental data, and in Section 5 the process of neural network training. In Section 6 
we present the results that arc obtained for the lepton energy spectrum and in Section 7 
the determination of the nonperturbative parameters Kis and Ai. Finally, in Section 8 we 
conclude and briefly sketch possible new applications of our strategy to other problems in 
the context of B meson physics. Two appendices summarize the most technical details of 
the neural network parametrization. 

2. Theory overview 

In this work the inclusive semileptonic decays of B mesons with charmed final states will be 
considered. Therefore, the process that will be analyzed \s B ^ XJiy, which is represented 
in Fig. 2. The differential decay rate for this process, 

B{p) ^ l{pi) + Hpp) + Xc{r) , (2.1) 

depends on three different kinematical variables q'^,r and Ei, where q = pi + Pu is the total 
four momentum of the leptonic system, r = p — q is the four-momentum of the charmed 
hadronic final state, with invariant mass = M^, and Ei is the lepton energy in the rest 
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frame of the B meson. This triple differential distribution can be decomposed, taking into 
account the kinematics of the process and the symmetries of the theory, in terms of three 
structure functions Wi, 



1v - 1v ■piv ■ q-^ —] W2{q^, u) + {2v pi - v ■ q) WsC^^, 



(2.2) 



where = r"^ — m^, v = p/rub, and the quantities with a hat are dimensionless quantities 
normalized to mi,. All the structure functions Wi{q^,u) have both a perturbative expansion 
in powers of Ug, and a nonperturbative expansion in powers of l/nib, which can be computed 
in the framework of the heavy quark expansion. For example, the complete set of 0{as) 
corrections for all the differential distributions that can be constructed from Eq. 2.2 with 
arbitrary kinematical cuts have recently become available [15,16]. 

The most general obscrvablcs that are accessible from the experiments, as it will be 
discussed below, are convolutions of differential distributions with suitable weight functions 
over a large enough range of energy, with kinematical cuts. A particular case of these 
observables are the moments of differential decay distributions. In this work the focus will 
be on leptonic moments, defined as 

Lr,{Eo, ix) = dEi {El - /x)" j dq^dr^^^^{q\r, Ei) , (2.3) 

where Eq is a lower cut on the lepton energy, and -Emax is the maximum energy allowed 
from the kinematics of the process that the lepton can have, 

2 2 

Eras. = -| ^ , 2.4 

2mB 

where is the average of the mass of the neutral and charged B mesons, and similarly for 
mo- The lower cut in the lepton energy in Eq. 2.3 is imposed by experimental requirements, 

as will be discussed in the next section. The quantity that is going to be parametrized with 
a Monte Carlo sample of neural networks, the lepton energy distribution, is defined as 



d^T 



that is related to the observable leptonic moments via 

LniEo, fi) = / dEi (El - fif —(El) . (2.6) 
JEo 

The lepton energy spectrum, Eq. 2.5, in 5 — Xclv decays, as has been discussed 
before, can be expanded in a power series both in as and in l/rrn,. The leading order 
spectrum in both expansions is given by [6] 

^ (S ^ X» = ro2y2 [(1 - ff{l + 2/)(2 - y) + (1 - /)^(1 - y)] 9{1 - y - p) , (2.7) 
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The kinematic support of the spectrum at this leading order partonic level is 



El e 



0, 



m 



2n 



2mh 



(2.10) 



where the upper limit is modified by nonperturbative (hadronic) corrections. The leading 
perturbative 0{as) corrections to this spectrum have been known for some time [17], and 
there are estimations of the size of higher order terms though the BLM expansion [18]. 
The leading nonperturbative C(l/m^) corrections to the lepton energy spectrum where 
computed in Refs. [6,19] and the 0(l/m^) corrections in Ref. [20]. 

The total decay rate, obtained by integration of Eq. 2.5, admits the following heavy 
quark expansion [6,21]: 



XJu) = FolFebl^ (1 + ^w) ^pert 



1 + 



Ai 
2ml 



+ 9{P) 



2ml 



+ 



ml 



(2.11) 



which depends up to 0(l/m^) on the the nonperturbative parameters Ai and A2, and 
where the phase space factors are given by 



zo(p) = 1 - 8p + 8/ - / - 12p2 log p, p 



m" 



m 



2 ' 



9{P) 



-9 + 24p - 72p^ + 72p2 - IBp"^ - 36p^ Inp , 



(2.12) 



(2.13) 



and where Af.^ stands for the electroweak corrections and Apert for the QCD perturbative 
corrections. Similar heavy quark expansions are available for the lepton energy moments 
(see Ref. [2] and references therein). 



3. Experimental data 

The experimental data that will be used in the present analysis consists on moments 
with kinematical cuts of the lepton energy distribution in scmileptonic B meson decays to 
charmed final states B Xclv. These moments have recently been measured with great 
accuracy at the B factories, Babar [22] and Belle [24], as well as by Cleo [25]. Therefore, in 
the present analysis the latest data from these three experiments will be used. Data from 
CDF [26] is not incorporated since it is restricted to hadronic moments. 

As it has been mentioned before, the main experimental difficulty for the measurement 
of the lepton energy spectrum for low values of the lepton energy is the fact that for low 
lepton energies the background from other decay modes dominates, and it is challenging 
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to disentangle the desired decay mode. Therefore, kincmatical cuts have to be imposed 
that remove the low Ei region of the spectrum. Another relevant consideration is that 
the reference frame change, from the laboratory frame to the B meson rest frame and 
several experimental corrections, like for example electroweak final state radiation, are 
easier to perform in terms of moments of the distribution. Therefore the final published 
measurements are moments of the lepton energy spectrum, Eq. 2.6, with different cuts in 
the lepton energy, rather than the full spectrum itself. 

Now the data that will be used for the present parametrization of the lepton energy 
spectrum will be described. The Babar Collaboration [22] provides the partial branching 
ratios, 

Mo{Eo) = tbLo{Eo, 0)=tb / -j^iEi) dEi , (3.1) 

JEo <^Ei 

where tb is the average B meson lifetime [23], the first moment, 

and the central moments, 

M„(^o) = ^^^^^^y^, n = 2,3, (3.3) 

for five different values of Eq from 0.6 to 1.5 GeV, which makes a total of 20 data points. 
The rationale for extracting Eq. 3.3 rather than for example 

is that in the former case correlations between different moments are smaller and therefore 
more independent information can be extracted from the measurements. 

The Belle Collaboration [24] provides the same moments, M„(£'o) ^or n = 1, n = 2 
and n = 3^. The difference with the Babar data is that the partial branching ratio Eq. 
3.1 is not measured, and that the Belle data cover a somewhat larger lepton energy range, 
since the lowest value of Eq of their data set is Eq = 0.4 GeV. These moments, for six 
different values of Eq from 0.4 to 1.5 GeV, make up a total of 18 data points. Finally the 
Cleo Collaboration [25] provides the moments Af„(£'o) for n = 1,2, for energies between 
0.6 to 1.5 GeV, for a total of 20 data points (10 data points for n = 1 and 10 data points 
for n = 2). The average correlations for this experiment are larger since measurements of 
the same moment at different energies Eq are highly correlated. 

The three collaborations provide also the total and statistical errors, as well as the 
correlation between different measurements. These features are summarized in Table 1. 



^For example, they define Mi = (Ei), which if one takes into account that the corresponding normaUzed 
probability density is given by 

1 \ dV 



= I jf-J^dE, ) m^^'^^ Eo<E,< (3.5) 



one ends up with Eq. 3.2, and similarly for the remaining moments. 
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Experiment 






n 


Eo (GeV) 


(fstat) 




m 


Babar [22] 


20 





- 3 


0.6 - 1.5 


6.0% 


8.0% 


0.50 


Belle [24] 


18 


1 


- 3 


0.4 - 1.5 


15.0% 


16.0% 


0.34 


Cleo [25] 


20 


1 


- 2 


0.6 - 1.5 


1.0% 


1.3% 


0.65 



Table 1: Features of experimental data on lepton moments M„(£'o)- Note that averages over 
experimental errors are given as percentages. 

Note that for all experiments correlations are rather large, so it is compulsory to incorporate 
them in a consistent way in the statistical analysis of the data. However, one has to be 
careful with the treatment of the experimental correlations, for reasons to be described on 
the next section. 

Note that the results of this work, summarized in section 6, consist on a parametriza- 
tion of experimental data without the need for any theoretical input. The uncertainties 
associated to the parametrization of the lepton spectrum will therefore be reduced if ex- 
perimental measurements of lepton spectrum moments are measured with larger accuracy 
in the future. 

3.1 Treatment of experimental correlations 

As has been already noticed, for example see the global analysis of B decays of Ref. [41], 
it can be checked that the experimental correlation matrices, p^j^^\ as presented with the 
published data of the three experiments [22,24,25], are not positive definite. The source 
of this problem can be traced to the fact that off-diagonal elements of correlation matrix 
are large, as expected since moments with similar energy cuts contain almost the same 
amount of information and are therefore highly correlated. Then one can check that some 
eigenvalues are negative and small, and this points to the fact that the source of the problem 
is an insufficient accuracy in the computation of the elements of the correlation matrix. 

However, whatever is the original source of the problem, the fact that the experimental 
correlation matrices are not positive definite has an important consequence: the technique 
introduced in [7] for the generation of a sample of replicas of the experimental data in a way 
that correlations are incorporated relies on the existence of a positive definite correlation 
matrix, and therefore if this is not the case our technique cannot be applied. 

A method to overcome these difficulties while keeping the maximum amount on infor- 
mation on experimental correlations as possible consists on removing those data points for 
which the experimental correlations are larger than a maximum correlation, p^^^^"* > p™^^. 
The value of p^^^^ is determined separately for each experiment as the maximum value for 
which the resulting correlation matrix is positive definite. In Table 2 the values of p™^ 
for each experiment are shown, together with the features of the remaining experimental 
data after those data points with too large correlations have been removed. In the case of 
the Belle measurements, another problem with the correlation matrix is that the contribu- 
tion to the correlation coefficients from systematic uncertainties has at present not been 
incorporated. 
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Experiment 


A^dat 


n Eo (GeV) 


{0"stat) {Ctot) 




Babar [22] 
Belle [24] 
Cleo [25] 


16 
15 
10 


0- 3 0.6 - 1.5 

1- 3 0.4 - 1.5 
1-2 0.6 - 1.5 


4.0% 5.0% 
18.0% 19.0% 
0.5% 1.0% 


0.97 0.49 
0.88 0.31 
0.95 0.69 



Table 2: Features of experimental data that is included in the fit, after data points with too 
large correlations have been removed. Note that averages over experimental error are given as 
percentages. 

4. Replica generation 

As has been mentioned in the Introduction, in this work the strategy of Ref. [7] is followed to 

parametrize the lepton energy spectrum from the experimental information on its moments. 
The first step of this technique is to generate an ensemble of Monte Carlo replicas of the 
original experimental data, which consists in the measured moments, which will be denoted 

by 

M,(^"P\ i = l,...,NA,t , (4.1) 

where M^^^"^^ stands for any of Eqns. 3.1-3.3, and AT^at is the total number of experimental 
data points, together with the total error and the correlation matrix. 

To generate replicas one proceeds as follows: the k— th artificial replica of the experi- 
mental data M^^^^^''^ is constructed as 

Mj-*)(^) = Mf + sfa^^^^ , i = 1, . . . , A^dat, , = 1, . . . , A^rep , (4.2) 

(k) 

where Sj are gaussian random numbers with same correlation matrix as the experimental 

correlation matrix pij, cr^^^^^ is the total error of the j-th data point and A'j-cp is the number 
of generated replicas of the experimental data. This way the ensemble of replicas is not 
only able to reproduce the central values and the errors but also the correlations of the 
experimental data. 

As explained in Ref. [7], the size of the replica sample is fixed by the condition that 
the averages over replicas reproduce the experimental values for central values, errors and 
correlations. The different statistical estimators are defined in Appendix B. In Table 3 the 
relevant statistical estimators for the replica generation are summarized. One can observe 
that to reach the desired accuracy of a few percent and to have scatter correlations r > 0.99 
for central values, errors and correlations, a sample of 1000 replicas is needed. 

5. Neural network training strategy 

As described in Ref. [7] , the next step of our strategy is to train a neural network to each 
of the replicas of the experimental data. Artificial neural networks^ (see Fig. 3) are highly 
nonlinear mappings between input and output patterns, defined by its parameters, called 
weights ujfj and thresholds 9f \ They provide unbiased robust universal approximants to in- 
complete or noisy data, and they interpolate between data points with the only assumption 

^For an introduction to neural networks, see Ref. [7] and references therein. 
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6zA/o io.o/o SA/o 

0.00265 0.00277 0.00268 

n Qi^ n QQ n qq 
u.yo u.yy u.yy 


r [/o^'^'''^)] 


60.1% 19.6% 6.7% 
0.132 0.138 0.155 
0.75 0.96 0.99 


r [cov('^'"*)] 


1.1 10"^ 1.4 10"*^ 1.3 10"^ 
0.86 0.98 0.99 



Table 3: Comparison between experimental and Monte Carlo data. 

The experimental data have ( )^^^ = 0.00267 , (p'^""'^^) = 0.166 and (cov(<=^p))^^^ = 
1.4 10^®, for a total of 41 data points. 




Figure 3: Schematic representation of an artificial neural network. 

of smoothness. A neural network is a suitable way of parametrizing experimental data since 
it is a most unbiased prior, and moreover in combination with the Monte Carlo methods 
it provides a faithful estimation of the uncertainties associated to this parametrization. 

In this work a particular class of neural networks called feed-forward perceptrons are 
used. For this class of neural networks, the relation that gives the values {activation states) 
of the i-th neuron in the 1-th layer depends on the activation states of the neurons in 
the previous layer, 

^f^=9{h?) , i = l,...,ni, l = l,...,L, (5.1) 

where of^ is the activation threshold of the neuron, L the total number of layers of the 
network and ni the number of neurons in each layer. The function g{x) is the activation 
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function of the neuron, which is taken to be a sigmoid in the inner layers, 

1 + exp(— x) 

and a linear activation function g{x) = x for the last neuron to increase the sensitivity 
of the network. For illustrative purposes, let us consider a simple neural network, which 
consists on two input neurons and one output neuron. If and ^2^^ are the values of the 
input neurons, then the value of the output neuron will be, from Eqns. 5.1 and 5.2, 



^? = f{c['\^?) = [i + cxp^-<er -<2^erjJ • (5.4) 

From the above explicit example it is clear that a neural network is a nonlinear function 
that relates the input values with the output values. 

Therefore, the lepton energy spectrum is parametrized with a neural network, 

where Ei is the lepton energy, so that if Ei is the input of the neural network, then 
{dr / dEi)^^^^^ is the associated output. 

Training a neural network means the determination of its parameters (the neuron 
weights and thresholds) to minimize a suitable statistical estimator. In our case for each 
replica the diagonal error will be minimized, defined as 



-1 



X 



2 



«=1 ^t,tot 



(iiet)(k) 

where M- is the i-th moment as computed from the k-th neural network, which is 

trained on the k-th replica of the experimental data, and (t^^^^^ is the total error of the i-th 
data point. 

The minimization technique that will be used for the neural network training is genetic 
algorithms, a minimization strategy that has been used in different high energy physics ap- 
plications [29] This method is specially suitable to find the global minima of highly 
nonlinear problems, as the one that is being discussed in this work. Other standard de- 
terministic minimization strategies, like for example MINUIT, are not suitable for this 
problem since the parameter space is very large. Genetic algorithms minimization is di- 
vided in steps called generations, so the number of generations of a given neural network 
training corresponds to the training length of the genetic algorithms minimization process. 

As has been discussed previously in detail [7], the choice of the architecture of the neural 

network (the number of layers and the number of neurons in each layer) cannot be derived 

from general rules and it must be tailored to each specific problem. In particular the neural 

network has to be redundant, that is, it has to have a larger number of parameters than the 

minimum number required to satisfactorily fit the patterns that have to be learnt, in this 

case experimental data. However, the architecture cannot be arbitrarily large because then 

*See also Ref. [30] for a recent application of Genetic Algorithms as the minimization strategy in a high 
energy physics problem. 
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the training length gets very large. In this case one finds that an acceptable compromise 
is given by an architecture 1-4-3-1. A suitable criterion to choose the optimal architecture 
is to take the architecture which is next to the first stable architecture, that is, the first 
architecture that can fit the data and gives the same fit that an architecture with one 
less neuron. This way one is sure that the neural network is redundant for the problem 
that is considered. Fig. 4 shows a training to the experimental data with three different 
architectures: first one observes that 1-2-2-1 is not capable to fit properly the data, but a 
more complex architecture 1-3-3-1 can fit this data. Therefore, the architecture 1-4-3-1 is 
taken as the reference architecture for the parametrization of the lepton energy spectrum. 

The training of a neural network does 
not follow general rules either, and the op- 
timal minimization strategy must be deter- 
mined for each particular problem. In the 
present situation the training strategy that 
is adopted is the following: there is a sin- 
gle training epoch in which the x^^^\ Eq. 
5.6, is minimized with dynamical stopping 
of the training of the replicas. That is, for 
each replica, the training is stopped either 
when the condition y^^^^ < Xstop is satisfied 
or when the maximum number of generations 




1400 1600 
Number of generations 



Figure 4: Comparison of fits it to the exper- 
imental data with different neural network ar- 
chitectures 



N, 



gen 



gen 



is reached. One finds that Xstop = ^ and 
3000 are suitable choices. On top of 



that, the neural network weights are initialized between [— cJinit/2, a;init/2] randomly, with 
"^init = 10. The rationale for this choice is that it can be observed that the natural value 
for the neural network weights is O (lO^), so the fit will be faster if the initial values for 
the neural network weights are of the same order of magnitude. 

Finally, in the training of the replicas the so-called weighted training will be used for 
the genetic algorithms minimization. As it has been shown in Refs. [7,8], it is in general 
useful to weight during the training the different experiments according to their x^, so that 
more weight is given to those experiments with a larger value of their x^, so that the final 
is more homogeneous than in the unweighted case. The essential idea of this technique 
is that the minimized Xminim is given by 

^ WjNd.^,t,jXj , (5-7) 



Xminim 



N. 



dat 



where Nds,t,j is the number of data points and Xj the value of the x^i Eq. 5.6, of the j-th 
experiment. One finds after a detailed analysis that the values ifBabar = 0.3, WBeiie = 2 
and wcico = 0.5 are suited to obtain a more even Xj distribution between experiments. 

5.1 Compatibility between experiments 

In global analysis of experimental data which consist of different experiments, as it is 
the case now (with Babar, Belle and Cleo), one has also to address the issue of possible 
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Figure 5: x^, Eq. 5.6, of the different experiments, for a fit to the experimental data including all 
experiments. 
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Figure 6: of the different experiments Figure 7: Same as in Fig. 6, but now only 
when only the Babar data is fitted. the Belle data is fitted. 



inconsistencies between different experiments, that is, the possibility that a subset of points 
from two experiments in the same region of the parameter space do not agree with each 
other within experimental errors. This issue is of paramount importance in the context of 
global parton distribution fits, see for example [8,31]. In the present application, it can 
be seen that the three experiments yield compatible results, as was already known from 
global fits to B decay data. 

This compatibility can be shown in a different number of ways. For example, training 
only one experiment and checking whether or not the other experiments can be predicted, 
that is, whether they have a low even if they are not incorporated in the fit. In Fig. 5 we 
show a fit to the experimental data for which all three experiments (Babar, Belle and Cleo) 
are incorporated in the fit. One observes that at the end of the training all experiments 
satisfy 1. In Fig. 6 show the results of a fit when only Babar is incorporated in the 
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fit, and in Fig. 7 the same fit with this time with data from the Belle experiment only. 
Note that when only a single experiment is incorporated in the fit, like in Figs. 6 and 7, 
only the of that experiment is expected to decrease, while the total might decrease 
slower or even grow. 

It is observed that the three experiments are not only compatible but also comple- 
mentary. In particular Cleo is predicted by both Belle and Babar (as expected since the 
kinematical coverage of the Cleo experiment is included in the other two experiments), 
while Belle and Babar cannot predict each other, as a consequence of the fact that differ- 
ent regions in the parameter space are covered by the two experiments: only Babar has 
experimental data on the n = moment (partial decay rate), while only Belle has data at 
the lowest lepton energy [Eq = 0.4 GeV). It will be shown in Section 6 that also for the 
correct estimation of the uncertainties the inclusion of data from different experiments is 
crucial. 

5.2 Kinematical constraints 

The lepton spectrum, Eq. 2.5, has to satisfy three constraints independently of the dy- 
namics of the process. First of all, it vanishes outside the region where it has kinematical 
support, in particular it has to vanish at the kinematical endpoints, Ei = and Ei = -Emax- 
Second, the spectrum is a positive definite quantity (since any integral over it is an observ- 
able, a partial branching ratio), therefore, it must satisfy a local positivity condition. 

There are several methods to introduce kinematical constraints in our parametrization. 
It has been found that for the present application, the optimal method to implement the 
kinematical constraint that the spectrum should vanish at the endpoints is to hard-wire 
them into the parametrization, that is, the lepton energy spectrum parametrized by a 
neural network will be given by 

with ni,n2 positive numbers, and is the output of the neural network for a given 
value of El. The assumption of this functional behavior at the endpoints of the spectrum 
introduces no bias since, as will be shown in Section 6, our results do not depend on the 
value of n\,n2- For the reference training the values ni = 1 and n2 = 1 have been chosen. 

The remaining kinematical constraint, the positivity constraint, is imposed as a La- 
grange multiplier in the total error. That is, the quantity to be minimized, x^^^ is the 
sum of two terms, 

Xtot,min Xdat ~^ Xpos ' 

(5.9) 

where the contribution from experimental data Xdat ^-^ ^^"^ contribution from 

the positivity constraint is defined as 

Xpos A-P 



dV 
dEi 



(net) 



(5.10) 
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Figure 8: The 1-a uncertainty band for the lepton energy spectrum, Eq. 2.5, as parametrized by 
the Monte Carlo ensemble of neural networks. 



where the positivity condition is implemented in a way that those configurations in which 
a region of the spectrum is negative are penalized, 



P 



dV \ 
dEi) 



(net)' 



(net) 



{El) 



\dEiJ 



(net) 



(El) 



(5.11) 



since P is zero for a positive spectrum, and leads to a positive contribution to the total 
error function, Eq. 5.9 if some part of the lepton spectrum is negative. The relative weight 
A in Eq. 5.10 is determined via a stability analysis, with the requirement that A is large 
enough so that the constraint is verified, but small enough so that experimental data can 
still be learned in an efficient way. It has been found that A = 10^'' satisfies the above 
requirements. As will be proved in the next section, the implementation of the kinematical 
constraints plays a essential role in the parametrization of the lepton spectrum. 



6. Results 

In this section the results on the parametrization of the lepton energy spectrum are pre- 
sented. These results consist on the sample of trained neural networks, from which aver- 
ages and moments can be computed with the associated uncertainties. The most technical 
details of these results for neural network parametrization and the associated statistical 
validation can be found in Appendix A. 

6.1 Lepton energy spectrum 

In figure 8 the lepton energy spectrum with uncertainties is represented. For illustration, 
let us recall how the central value and the spread of this spectrum are computed from the 
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neural network sample. For the average one has 



while for the spread the appropriate expression is 



(nct)2 




2 

(El) . (6.2) 



In Fig. 8 the 1 — a envelope of the lepton spectrum is plotted, where the central value has 
been computed with Eq. 6.1 and the standard deviation with Eq 6.2. Note that the error 
is rather small for large values of the lepton energy E'; > 1.8 GeV, and it grows for smaller 
values of Ei. The error bands for the other plots are computed in the same way. 

As discussed before, the sample of trained neural network reproduce the correlations 
of the experimental data. For instance, imagine that one is interested in the correlation 
between two moments of the lepton spectrum, M„^(£^oi) and M„2(-Eo2), of arbitrary or- 
der and arbitrary lepton energy cut. With the probability measure of the lepton energy 
spectrum constructed in this work, this correlation pi2 = p (ni, £'oi, n2, -E'02) is given by 

(MiT\Eo,)MiT\Eo2)) - (Mt'\Eo,)) (MiT\Eo2)) 

\ / rep \ / rep \ / rep 

J(Mt'\Eoi)A -(Mt'\Eoi)) J(MiT\Eo2?) - (MiT\Eo2)) 

\ \ I rep \ / rep y ^ ' rep \ / rep 

(6.3) 

where averages over the sample of neural networks are computed in the standard way, for 
instance 

M(f )(£;oi)M(f )(£;o2)) = E M(f )('=)(£;oi)M(f )('=)(£;o2) , (6.4) 



rep A^rep 



and similarly for the remaining averages. This examples clarifies that fact that not only 
central values and total errors from experimental data, showed in Fig. 8, but also correla- 
tions are present in the parametrization of the lepton energy spectrum. 

As it has been explained in [7] , it is crucial to validate the results of the parametrization 
with suitable statistical estimators. In Table 4 the most relevant statistical estimators for 
all the data points are summarized, and in Table 5 one has the same estimators for the 
different experiments included in the fit. A more detailed analysis of these estimators is 
found in Appendix A. 

It has been checked that the large value of of the BELLE experiment is not because 
globally their data is not properly fitted (as can be seen in the plots), but that it is only 
due to two points, n = 2, 3, = 1-5 GeV that have an anomalously large x^- If those two 
points are not included then Xtot Belle ~ 0.92. These two points are systematically below 
Babar data with errors half as small. This is similar as what happened with the NMC 
experiment is the proton structure function fits described in Ref. [7]. 



-16- 





10 100 1000 


Xtot 


1.31 1.18 1.21 
2.50 2.28 2.33 


/ PE (M) \ 
r[M] 


9% 8% 8% 
0.999 0.999 0.999 


(^^ ^/dat 


67% 58% 45% 
O.OOzDY O.OOzDY 0.00267 
0.00180 0.00169 0.00187 

0.77 0.85 0.86 


/ „(cxp) \ 

^/dat 


U.ioo U.ioo U.ioo 
0.320 0.245 0.324 

0.35 0.38 0.38 


(^°^^^^'^>dat 
<«°V^'^"^>dat 


1.4 10-6 1.4 10-6 1.4 10-6 
7.8 10-^ 6.7 10-'^ 1.0 10-6 
0.49 0.53 0.53 



Table 4: Statistical estimators for the ensemble of trained neural networks, for 10, 100 and 1000 

trained replicas 





Babar Belle Cleo 


Xtot 

ix') 


0.42 2.06 1.22 
1.67 3.13 2.21 


{PE [(M),,pJ ) 

r[M] 


2.3% 18.1% 0.6% 
0.999 0.999 0.999 


<^^^^''^>dat 
(-^""^)dat 


34% 44% 65% 
0.0023 0.0021 0.0041 
0.0018 0.0017 0.0022 

0.9i 0.89 0.83 


(P^^^^^>dat 
<P^°"^>dat 


0.16 0.40 0.31 
0.15 0.28 0.51 
0.87 0.29 0.31 


(-V^^^^Odat 

r [cov(°^*)] 


6.9 10-6 1.5 10-6 6.5 10-"^ 
2.0 10-^ 1.2 10-6 1.8 10-6 
0.98 0.58 -0.21 



Table 5: Statistical estimators for the ensemble of trained neural networks, for those experiments 
included in the fit. The replica sample consists of 1000 neural networks. 

In Figs. 9 to 12 the computation of the moments of the lepton energy spectrum from 
our parametrization is compared to the experimental data from Babar, Belle and Cleo, and 

good agreement for all the data points is observed. Note that some of the experimental 
data points have not been included in the training, for the reasons discussed in Section 3.1, 
but nevertheless the lepton energy parametrization is in good agreement also with those 
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data points. 

To asses the relevance of the implementation of kinematical constraints into our neural 
network parametrization of the lepton spectrum, it is instructive to compare fits with and 
without the inclusion of kinematical constraints. In Fig. 13 one can observe that when the 
endpoint constraint at = and the positivity constraint are removed the error becomes 
very large at small Ei. This is so because experimental data does not constrain the value 
of the lepton spectrum for low values of Ei. Note that the physical value for the spectrum 
at the endpoint, {dT/dEi) {Ei = 0) = 0, is contained within the small--Ei error bars. 



0.5 1 
Cut on lepton energy Eq [GeV] 



Figure 9: Comparison of the partial branch- 
ing ratio, Eq. 3.1 obtained from our 
parametrization with the experimental mea- 
surements, as a function of the lower cut on 
the lepton energy Eq. 




NN parametrization 

^ Babar data 
iji Belle data 
i Cleo data 




0.25 0.5 0.75 1 1.25 1.5 

Cut on lepton energy Eq [GeV] 

Figure 10: Same as. Fig 9 for the first mo- 
ment Ml, Eq. 3.2 




Cut on lepton energy E„ [GeV] Cut on lepton energy E„ [GeV] 



Figure 11: Same as Fig. 9 for the second Figure 12: Same as Fig. 9 for the third 
moment, Af2, Eq. 3.3. moment, Afa, Eq. 3.3. 



To estimate the contribution of the different experiments to the global fit, it is inter- 
esting to compare (see Fig. 14) a fit in which only one experiment, Babar is incorporated 
in the fit. It can be observed that when only the Babar data is fitted, the error at small 
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Figure 13: Lepton energy spectrum when no kinematic constraints are incorporated in the fit. 
One one can see in this case the error at small Ei grows very large and the extrapolation to £^o = 
becomes unreliable. 



All experiments trained 
Only Babar trained 




0.5 1 1.5 

Lepton energy E, [GeV] 



0.125 




nj — 1, Hg— 1 

— — ni = 1.5, n2=1.5 






0.100 










0.075 






/// 




0.050 




















0.025 








\ 


0.000 








\ - 

\ 




. , 1 . , . , 1 







0.5 1 1.5 

Lepton energy E, [GeV] 



Figure 14: Lepton energy spectrum when Figure 15: Comparison of the lepton energy 



only Babar data is incorporated in the fit. 



spectrum when the for different values of the 
parameters ni and n2 



values of Ei is much larger. This is so because, as discussed above, the Belle data, which 
extends to lower values of Ei, is crucial to determine the low Ei behaviour of the lepton 
spectrum, together with the kinematical constraint at Ei = 0. Finally, Fig. 15 shows that 
our results are independent of the precise choice of ni and n2 in Eq. 5.8. In particular a 
fit with the values rii = 1.5 and n2 = 1.5 in Eq. 5.8 gives the same results as the fit with 
the reference values, ni = 1 and n2 = 2. 

With the results described in this section the total branching ratio can be computed, 
even if experimental information was restricted to a finite value of Eq. This is possible 
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because the continuity condition implicit in the neural network definition together with 
the kinematic constraint allow for an accurate extrapolation from the experimental data 
with lowest Ei = 0.4 GeV to the kinematic endpoint Ei = 0. Note that this is not true if 
the Belle data is not included in the fit, see Fig. 14. The result that is obtained for the 
partial decay rate, computed from the neural network sample. 



{B {B XJu)) = {Mo{Ei = 0)) =rBj^^Yl I [dWj ^^'^ ' ^^"^^ 



is the total branching ratio, 

i3(S-^Xc/i/) = (10.8 ±0.4) 10-2 , (6.6) 

which is to be compared with the 2005 update for the PDG result [32] for the average 
branching ratio of neutral and charged B mesons, 

B{B^ XJu)pj^Q = (10.87 ± 0.17) 10"^ , (6.7) 

and with the direct Delphi measurement of the total branching ratio [33] , which is measured 
without restrictions on the lepton energy, which yields 

B{B^ Xe/z.)Deiphi = (10.5 ± 0.2) 10-2 . (6.8) 

Is is observed that the three results are compatible, even if our determination is somewhat 
closer, both in the central value and in the size of the uncertainty, to the Delphi measure- 
ment. The small error in our determination oi B {B ^ XJu) shows that the technique 
discussed in this work can be used also to extrapolate in a faithful way into regions where 
there is no experimental data available. 

The results of this section show that from the available experimental data one can 
reconstruct the underlying lepton energy spectrum with good accuracy. 

6.2 Comparison with theoretical predictions 

As one example of the applications of the present parametrization of the lepton energy spec- 
trum, in this section our results are compared with the theoretical calculation of Rcf. [16] 
(AGRU). Their formalism allows the computation of moments of different differential dis- 
tributions from semileptonic B meson decays, with arbitrary kincmatical cuts, like the 
lepton energy spectrum in charmed decays that is analyzed in the present work. In par- 
ticular their computation of the lepton energy spectrum will be studied, which they define 
as 

Nk = ^ dEidq^drEf , , (6.9) 

Tlo dEidrdq^ 

with E = E/mb, and where the leading order partonic semileptonic decay rate Flo is given 

by 

rLo = ro|Kbl'2o(p) , (6.10) 
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where p = m^/rn^ and the phase space factor is defined in Eq. 2.12. These moments can 
be related to the moments as measured experimentaUy , defined in Eqns. (3.1- 3.3), in a 
straightforward way, for example for the first two moments one has 



Mo = tbTqNo, 
and similarly for the other moments. 
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Figure 16: Comparison of the results of Ref. 
[16] on the partial branching ratio Eq. 3.1 
and the same quantity computed from our 
parametrization. 



Ml = mb-^ , (6.11) 
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Figure 17: Same as Fig. 16 but for first 
moment, Eq. 3.2. 




Figure 18: Comparison of the results of Figure 19: Same as Fig. 18 but for first 
Ref. [16] on the partial branching ratio Eq. moment, Eq. 3.2. 
3.1 at NLO with associated theoretical un- 
certainties with the same quantity computed 
from our parametrization. 



In Figs. 16 and 17^, the results of [16] both at leading order (LO) and at next-to- 
leading order (NLO) are compared with the moments obtained from our parametrization 

^We thank G. Ridolfi for providing us with the code used for their calculations 
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Figure 20: Comparison of tiie results of Ref. [16] on the fourth moment M4{Eo) at NLO with 
associated theoretical uncertainties with the same quantity computed from our parametrization. 
Note that this moment has not been measured experimentally. 

as a function of the lower cut in the lepton energy Eq. Comparing results at different per- 
turbative orders is interesting to asses the behaviour of the perturbative expansion. One 
should take into account in this comparison that the results of [16] are purely perturbative, 
therefore the difference between the two results could be an indicator of the size of the 
missing nonperturbative corrections. Another interesting feature is that while for Mq, the 
partial branching fraction, the NLO corrections are sizable and bring the theoretical pre- 
diction in better agreement with the experimental measurement, for Mi (which is the ratio 
of two perturbative expansions) the size of the perturbative corrections is much smaller. 

In Figs. 18 and 19 we show similar results as those of Figs. 16 and 17 but this 
time with an estimation of the theoretical uncertainties associated to the predictions of 
Ref. [16]. These theoretical uncertainties are obtained by varying the b quark mass nii, 
100 MeV above and below the central value, and similarly for the strong coupling as(m^). 
The known fact that the uncertainties of theoretical predictions grows for large values of 
the cut in the lepton energy Eq is clearly observed in these results. Note that in all cases 
comparison of theoretical predictions with theoretical measurements can be performed for 
arbitrary values of the cut in the lepton energy Eq. 

On top of that, in Fig. 20 we compare a quantity that has not been measured, the 
fourth moment of the spectrum, defined as 



We observe good agreement for the theoretical prediction and the experimental data in 
the region 0.8 < Eq < 1.5 GeV with rather small uncertainties in both cases. It can also 
be seen that for Eq > 1.5 GeV the theoretical uncertainty for this moment grows while 
the experimental uncertainty remains rather small, which implies that theoretical uncer- 
tainties should be reduced at least by a factor of 2 or 3 to be able to perform quantitative 
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comparisons with experimental data for moments with large cut in the lepton energy. Note 
therefore that the results in this section imply quantitatively how the uncertainties in theo- 
retical predictions should be reduced to obtain a meaningful comparison with experimental 
data, for example for moments with large cuts in the lepton energy Eq. 

A more general comparison with theoretical predictions should include also the known 
nonperturbative power corrections up to order 0{l/ml) to the expressions for the mo- 
ments of the spectrum, since in this case the difference of the theoretical results from our 
parametrization would indicate the size of the missing unknown corrections, both pertur- 
bative and nonperturbative. A more detailed study of this point, together with an analysis 
of possible violations of local quark-hadron duality, is left for future work. 

The analysis presented in this section is a particular example of how the technique 
introduced in this work allows a more general comparison of theoretical predictions with 
experimental data. For example, current experiments do not measure the leptonic moments 
with £"0 > 1.5 GeV, since it is argued that the corresponding theoretical prediction has large 
uncertainties. If in the future this theoretical error in the computation of leptonic moments 
with large values for the cut Eo is reduced, comparison with experimental results can be 
straightforward, if one computes these moments from the neural network parametrization 
of the lepton spectrum, which encodes all the information on available experimental data. 

7. Determination of ml^ and Ai 

As another application of our parametrization of the lepton energy spectrum, it will be 
used to determine the b quark mass ml^ from the experimental data using a novel strategy. 
To this purpose the technique of Ref. [14] will be used, which consists on the minimization 
of the size of higher order corrections to obtain sets of moments of the lepton energy 
spectrum which have reduced theoretical uncertainty for the extraction of nonperturbative 
parameters like Ai^ and Ai. Note that the nonperturbative parameter A15 relates the 
spin-averaged B meson mass to the IS scheme b-quark mass , 



The IS scheme b-quark mass is related to the standard b quark pole mass m^°^ by a 
perturbative relation. 



The use of heavy quark masses, like the IS mass or the MSbar mass, which are not infrared 

sensitive, is compulsary to avoid the uncertainties assciated to the renormalon divergence 
of some infrared sensitive definitions of the heavy quark masses, like the pole mass. The 
parameter Ai that appears in Eq. 2.11 is related to the the definition of the heavy quark 
pole masses [20] in the following way. 



Ai5 = 1713- 



IS 



(7.1) 




(7.2) 



m 



pole 



pole 



ms — mo + Ai 



TUB - mo 



2mBlTlD 




) 



(7.3) 
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The moments that minimize the impact of the higher order nonperturbative corrections 
are given by 



and 



Ji.4 m^^^ 



= il± ' "^^ ' . (7.5) 

The full expression for this moments in terms of heavy-quark non-perturbative parameters 
can be found in Ref . [14] , where in terms or their original notation one has Ri = R^a^ and 

(2) 

R2 = Ra ■ These leptonic moments i?i and R2 depend on 9 nonperturbative parameters, 
up to 0(l/m^): Ai5, Ai and A2, and six matrix elements, pi, P27Ti,T2,t^ and T/[, that 
contribute at order l/mf in the heavy quark expansion. Present data is not capable of a 
determination of all these matrix elements. For the A2 parameter we use 

while to to asses the contribution of the O {l/irif) parameters, the matrix elements Tj are 
varied between it (500 MeV)^ (the expected size of this matrix elements), p\ between and 
(500 MeV)^ (since from the vacuum saturation approximation one knows that pi > 0), and 
for the matrix element p2 one uses the relation from the power corrections to the meson 
mass splittings [20, 36] , 

P'^ ^ 2 (m^ — Km-c) ' 



where we have defined 

3//3o 



as{rnc) \' 
oisimb)/ 



AniQ = niQ^ - rriQ , (7.8) 



and where k account for the scale dependence of the parameter A2. 

The most relevant feature of these leptonic moments Ri and R2 is that they have non- 
integer powers and to the best of our knowledge have not been yet experimentally measured, 
at least in a publised form. Therefore, the values of Ri and R2 that will be used in this 
analysis are extracted from our neural network parametrization of the lepton spectrum, 
which allows the computation of arbitrary moments, together with their associated error 
and correlation. Let us recall that the central values are determined as 

^1.4 (irt"^')^^) 



^ JVrop f-^™ Pl-4dr^^^^j^^p \ JIT. 

?(net)\ _ 1 p(net)(fc) p(net)(fc) _ Jl.3 dEi 

'^''P k=i Ji ^i—dE, — 



and similarly for i?2, and the error and the correlation of the moments Ri and R2 are 
computed in the standard way. The following values for the moments with the associated 
errors and their correlation are obtained, 

^(net) ^ ^ ^^^ _^ ^(net) ^ Q ^^g _^ ^^^^^ ^ ^ ^^^^ 
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that as expected are highly correlated. Then to determine the nonperturbative parameters 
Ai5 and Ai the associated xlt minimized, 

where covj"-^ is the inverse of the covariancc matrix associated to the two moments r{^^^^ 
and i?2^'^*\ and Rf^^^ (^15, Ai) is the theoretical prediction for these moments as a function 
of the two nonperturbative parameters [14]. 

Once the values of A15 and Ai have been determined from the minimization of Eq. 
7.11, if one uses for the spin averaged B meson mass the values for the current world 
average [32] 

fhB = ^ {mp + 3mv) = (5.3135 ± 0.0008) GeV , (7.12) 
then using the extracted value of A15, 

Ai5 = (0A7 ± O.M'^^P ± 0.05*^) GeV (7.13) 

one obtains for the b quark mass mass in the IS scheme the following value: 

= fhB- Ai5 = (am ± 0.14^^P ± 0.05*^) GeV = (4.84 ± 0.15*°*) GeV , (7.14) 

Prom the above results one observes that the dominant source of uncertainty is the experi- 
mental uncertainty, that is, the uncertainty associated to our parametrization of the lepton 
energy spectrum. This determination of the b quark mass is consistent with determinations 
from other analysis. The b quark mass has been determined using different techniques, 
like the sum rule approach, using either non-relativistic [34,35,37,38] or relativistic [39,40] 
sum rules, global fits of moments of differential distributions in B decays, [28,33,41], the 
renormalon analysis of Ref. [42] , and several other methods related to heavy-quarkonium 
physics [43,44] (see [45] for a review). To compare our results with some of the above 
references, it is useful to relate the mass to the MS-bar fhi,{fhb) mass [35,46], and 
once the conversion is performed^ the value 

fhb (ffib) = (4.31 ± 0.15*°*) GeV , (7.15) 

is obtained, where we have used as{M^) = 0.1182 and included perturbative corrections 
up to two loops. It turns out that our determination of is not competitive with 
respect to other determinations since only two moments, Ri and R2 are used to constrain 
the nonperturbative parameters in the fit. Note therefore that the relatively large error 
in the extraction of fhh {fhh) are not due to large uncertainties in our parametrization 
of the lepton energy spectrum, which are the same than experimental data, but rather 
from the use of a reduced set of moments in the fit. Note also that the motivation to 
perform this determination of fhh (fhf,) is to show how the neural network parametrization 

®We thank Andre Hoang for pointing us the appropiate references to perform the mass sheme conversion. 
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constructed in this work allows a more general comparison of experimental data with 
theoretical predictions, in this case allowing to use moments with fractional index with their 
errors and correlations, which have not been measured directly, at least in a published form. 
The inclusion of additional moments would therefore constrain more the nonperturbative 
parameters and reduce the experimental uncertainty associated to them. 
For the nonperturbative parameter Ai the following value is obtained 

Ai = (-0.16 ± O.M^^'P ± 0.05*'^) GeV^ = (-0.16 ± 0.15*°*) GeV^ . (7.16) 

As in the determination of A15 it can be seen that the theoretical uncertainties are smaller 
than the experimental ones, which are the dominant ones. Our result for the parameter 
Ai is consistent with other extractions in the context of global fits of B decay data [33,41], 
but again not competitive due to the large experimental uncertainties. 

In summary, a determination of m^'^ and Ai has been obtained from our neural network 
parametrization in a way that was not directly possible from the available experimental 
data. However, it turns out that the experimental uncertainties in the present determina- 
tion do not allow these results to be competitive with those from other determinations using 
different techniques, even if in this approach the theoretical uncertainties where minimized. 

8. Conclusions and outlook 

This works presents a determination of the probability density in the space of the lepton 
energy spectra from semileptonic B meson decays, based on the latest available data from 
the Babar, Belle and Clco collaborations, that makes use of a combination of Monte Carlo 
techniques and neural networks with results in an unbiased parametrization with faithful 
estimation of the uncertainties. In addition, this work shows the implementation of a 
well definite strategy to reconstruct functions with uncertainties when the only available 
experimental information comes through convolutions of these functions. Moreover, in 
our formalism the implementation of arbitrary theoretical constraints can be done in a 
consistent and unbiased way. 

As a byproduct of our analysis, with our parametrization of the lepton spectrum, 
the nonperturbative parameters Aig and Ai have been extracted in a way that minimizes 
the theoretical uncertainties. For the b quark mass in the 15 scheme the result = 
rriB - Ai5 = (4.84 ± 0.16^^"^ ± 0.05*'') GeV has been obtained. Although this application 
demonstrates the flexibility of our approach to allow a more general comparison of data with 
theoretical predictions, it turns out that the use of a reduced set of non- integer moments 
implies that uncertainties are rather large to make this determination competitive with 
those from global fits of moments of B meson decay distributions [28,33,41], which include 
additional moments. A reevaluation of from our neural network parametrization of 
the lepton energy spectrum with a larger set of moments will be studied in a following 
work. 

The number of possible applications of this strategy to other problems in B physics is 
rather large. For example, the inclusion of hadronic moments would allow a parametrization 
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of the double differential decay rate 

in terms of a neural network with two inputs. From this two-dimensional spectrum the 
hadronic invariant mass moments, as have been recently measured by Babar [47] and 
Belle [48], would be computed as 

/•-Bmax /•Ms/2 J2p 

HniEo,f^) = J^ dEij^ dr{r-iir-^{Ei,r) . (8.2) 

In this case both the training and the implementation of the kinematical constraints is 
rather more complicated, since one has to parametrize a two-dimensional surface. 

The charmless decay channel, B X^lv would be also interesting to analyze, since it 
has received a lot of theoretical attention recently, specially in the context of effective field 
theories, see for example Refs. [49 54] and references therein. However, this mode is more 
challenging to measure due to large backgrounds. Another interesting application would 
be to estimate with our technique the issue of the parametrization dependence of the form 
factor of the B 'kIv exclusive channel, as discussed in Ref. [55]. 

A process that is closely related to the semilcptonic decays is the analysis of the photon 
energy spectrum in — > Xg^ decays [57,58]. This process has also been recently measured 
with good accuracy at the B factories, by Babar [59] and by Belle [60]. The strategy to 
be followed in this process would be very similar to that of the present work, since the 
experimental information has the same form. 

Finally one can use the neural network strategy to construct a parametrization of the 
shape function S{k) of the B meson, a universal characteristic of the B meson that governs 
inclusive decay spectra in processes with massless partons in the final state, as extracted 
from the B — > Xg'y and B Xylv decay modes. In this case there exist more theoretical 
information on its shape. For example, at tree level its moments 

An = j dkk^Sik) (8.3) 

have to satisfy y4o = l,^i = and A2 = At higher orders these relations are 

theoretically more controversial [61,62]. Since the uncertainty from the extraction of S{k) 
is the dominant source of theoretical uncertainty in some CKM matrix elements extraction, 
it would therefore be interesting to estimate again this uncertainty with the technique 
presented in this work, since in the current approach [63] the shape function uncertainties 
axe estimated in a rather crude way, with a combination of different functional forms 
compatible with the theoretical constraints. The application of the techniques introduced 
in this work to obtain an unbiased parametrization of the B meson shape function with 
a faithful estimation of its uncertainties from experimental data will be discussed in a 
forthcoming publication [12]. 

The set of trained neural networks that represents the probability measure in the space 
of differential lepton energy spectra, together with the driver program and a user manual 
are available from the author^. 

^ j oanro j o@ecm .ub.es 
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A. Details of the neural network training 

In this Appendix a more detailed statistical analysis of the parametrization of the lepton 
energy spectrum is performed. To understand better the process of neural network training 

it is interesting to analyze the evolution of the different statistical estimators, as defined 
in Appendix B, with respect to the number of generations, that is, with respect of the 
training length. In fig. 21 the evolution of Xtot ^^'^ of (x^) computed from the trained 
replica sample can be observed. Note that at the end of the training Xtot ~ 1 ^^'^ (x^) ~ 2, 
as expected. Note also that the fit has reached convergence with the x^ot profile is very 
flat for a large number of generations. 

This can be repeated for other statistical estimators, like for example the average 
spread of the data points as computed from the neural network ensemble, Fig. 22, defined 
in Appendix B, which is to be compared with the same quantity computed from the 
experimental data, af'^'^^ The fact that one has error reduction, as has been explained 
in [7] , is the sign that the network has found an underlying law to the experimental data, 
in this case the lepton energy spectrum. 

Other relevant estimator is the so-called scatter correlations of the spread of the points 
(see Fig. 23). The scatter correlation indicates the size of the spread of data around a 
straight line. Specifically r [cr^'^^''''] = 1 implies that ^o'^'^*^*'*^ is proportional to a^'^^^\ One 

can define similarly a scatter correlation for the net correlation pfj^^\ also represented in 
Fig. 23 for the Babar experimental data. One observes that when the training ends both 
values of r are close to 1, a sign that errors and correlations are faithfully reproduced. 

Another relevant estimator of the goodness of the fit is the distribution of both x^^'^^ 
and of the training lengths over the replica sample, figures 24, and 25. The distribution of 
T^^C^) over the replica sample should be rather peaked around (x^)? because the opposite 
case would mean that the averaged result is obtained as a combination very good fits with 
very bad fits (in the sense of fits with very large x^)- It can be observed in Figure 24 
that indeed our distribution is very peaked around the average. On the other hand, the 
distribution of training lenghts. Fig. 25, has to be smooth and it cannot be peaked at 
A^gen, because if a too large fraction of the nets never reach the condition x^^^^ ^ Xstop 
then effectively one is stopping the training after a fixed number of generations regardless 
of the value of the x^^'^^ of the trained replica. It can be seen that only ~ 20% of the 
trained replicas do not reach Xstop' which is an acceptable fraction. 
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Figure 21: Total Xtot^ Eq. B.12 of the replica sample, compared with average error, (x^), Eq. 
B.13. 
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Figure 22: Average error of the data points 
as computed from the neural network sam- 
ple, Eq. B.2, as compared with the experi- 
mental value. 



Figure 23: The scatter correlations, Eq. 
B.ll as a function of the lenght of the train- 
ing, for the Babar experimental data. 



B. Statistical estimators 

In this appendix the statistical estimators that are used to asses the quality of both the 
Monte Carlo replica generation and the neural network training are described. The su- 
perscripts (dat), (art) and (net) refer respectively to the original data, to the A'rep Monte 
Carlo replicas of the data, and to the A^rep neural networks. The subscripts rep and dat 
refer respectively to whether averages are taken by summing over all replicas or over all 
data. 

• Replica averages 
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Figure 24: Distribution of over the sara- Figure 25: Distribution of training lenghts 
pie of trained replicas. over the sample of trained rephcas. 
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where the scatter variances are defined as 
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dat 



art)'* 



■■'^P/dat 



)' • (B.8) 



We define analogously r [cr^^*)] , r [p^^*)] and r [cov^^*)] . Note that the scatter 
correlation and scatter variance are not related to the variance and correlation 
Eqs. B.2-B.4. 

— Average variance: 



a 



(art) 



, A^dat 
-1- (art) 



dat A^dat ~^ 



(B.9) 



We define analogously {p^^^)^g_^ a,nd (cov'^^''*))^^^, as well as the corresponding 
experimental quantities. 

• Neural network averages 

— Mean variance and percentage error on central values over the A^at data points. 
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— We define analogously percentage errors on the correlation and covariance. 

— Scatter correlation 



j^(net) 



(exp) (net) 



"-^^^ . (B.ll) 



We define analogously (p^°^*^)(jat ^^"^ (cov^°^*)) 



daf 



On top of these, one has also to define the estimators that measure the global quality 
of the fit, namely the total error 



Xtot 



A^d 



-1 ' aar 

dat • , 

1=1 



rep 



(B.12) 



tot.i 



and the average error over the replica sample, 

^ A^ep ^ N,^, (^^^art)(fc)_^^nct)(A.))y 



<x^> 



AT, 



^ ^dat g 



i=k 



(B.13) 



tot,; 



On general grounds [7] one expects the relation (x^) ~ Xtot + 1 to hold, and indeed this is 
the case as can be seen in Tables 4 and 5. 
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